Bagging Using Statistical Queries
نویسندگان
چکیده
Bagging is an ensemble method that relies on random resampling of a data set to construct models for the ensemble. When only statistics about the data are available, but no individual examples, the straightforward resampling procedure cannot be implemented. The question is then whether bagging can somehow be simulated. In this paper we propose a method that, instead of computing certain heuristics (such as information gain) from a resampled version of the data, estimates the probability distribution of these heuristics under random resampling, and then samples from this distribution. The resulting method is not entirely equivalent to bagging because it ignores certain dependencies among statistics. Nevertheless, experiments show that this “simulated bagging” yields similar accuracy as bagging, while being as efficient and more generally applicable.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملCombining Bagging and Additive Regression
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in t...
متن کاملBagging Binary Predictors for Time Series
Bootstrap aggregating or Bagging, introduced by Breiman (1996a), has been proved to be effective to improve on unstable forecast. Theoretical and empirical works using classification, regression trees, variable selection in linear and non-linear regression have shown that bagging can generate substantial prediction gain. However, most of the existing literature on bagging have been limited to t...
متن کاملRoughly Balanced Bagging for Imbalanced Data
Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distributions. In our new sampling method “Roughly Balanced Bagging” (RB Bagging), the number of samples in the largest and smallest classes are different, but they are effectively balanced when averaged over all subsets, wh...
متن کاملInvestigating the Effect of Underlying Fabric on the Bagging Behaviour of Denim Fabrics (RESEARCH NOTE)
Underlying fabrics can change the appearance, function and quality of the garment, and also add so much longevity of the garment. Nowadays, with the increasing use of various types of fabrics in the garment industry, their resistance to bagging is of great importance with the aim of determining the effectiveness of textiles under various forces. The current paper investigated the effect of unde...
متن کامل